{
  "cells": [
    {
      "cell_type": "markdown",
      "source": [
        "# External database integration\n",
        "\n",
        "txtai provides many default settings to help a developer quickly get started. For example, metadata is stored in SQLite, dense vectors in Faiss, sparse vectors in a terms index and graph data with NetworkX.\n",
        "\n",
        "Each of these components is customizable and can be swapped with alternate implementations. This has been covered in [several previous notebooks](https://neuml.github.io/txtai/examples/#architecture).\n",
        "\n",
        "This notebook will introduce how to store metadata in client-server RDBMS systems. In addition to SQLite and DuckDB, any [SQLAlchemy-supported database](https://docs.sqlalchemy.org/en/20/dialects/) with [JSON support](https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqlalchemy.types.JSON) can now be used."
      ],
      "metadata": {
        "id": "781Mrl0FtFR8"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Install dependencies\n",
        "\n",
        "Install `txtai` and all dependencies."
      ],
      "metadata": {
        "id": "VG8sklTPwlJa"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "XrqaDHiCnFnr"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "\n",
        "!pip install git+https://github.com/neuml/txtai#egg=txtai[database] elasticsearch==7.10.1 datasets"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Install Postgres\n",
        "\n",
        "Next, we'll install Postgres and start a Postgres instance."
      ],
      "metadata": {
        "id": "6pc469NKwyMu"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "1JZhUIMmkmil"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "\n",
        "# Install and start Postgres\n",
        "!apt-get update && apt-get install postgresql\n",
        "!service postgresql start\n",
        "!sudo -u postgres psql -U postgres -c \"ALTER USER postgres PASSWORD 'postgres';\""
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Load a dataset\n",
        "\n",
        "Now we're ready to load a dataset. We'll use the `ag_news` dataset. This dataset consists of 120,000 news headlines."
      ],
      "metadata": {
        "id": "lS16J4QBxDJP"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "TMVkdPSgNlD0"
      },
      "outputs": [],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "# Load dataset\n",
        "ds = load_dataset(\"ag_news\", split=\"train\")"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Build an Embeddings instance with Postgres\n",
        "\n",
        "Let's load this dataset into an embeddings database. We'll configure this instance to store metadata in Postgres. Note that the content parameter below is a [SQLAlchemy connection string](https://docs.sqlalchemy.org/en/20/core/engines.html).\n",
        "\n",
        "This embeddings database will use the default vector settings and build that index locally."
      ],
      "metadata": {
        "id": "iERZH9w1xQGC"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "OmnobNLGNhW4"
      },
      "outputs": [],
      "source": [
        "import txtai\n",
        "\n",
        "# Create embeddings\n",
        "embeddings = txtai.Embeddings(\n",
        "    content=\"postgresql+psycopg2://postgres:postgres@localhost/postgres\",\n",
        ")\n",
        "\n",
        "# Index dataset\n",
        "embeddings.index(ds[\"text\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's run a search query and see what comes back."
      ],
      "metadata": {
        "id": "GqOMCOT0xnnC"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "qD4tBdMKI4MX",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "109fe906-ca74-49c9-ed4e-922eaefd398e"
      },
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[{'id': '63561',\n",
              "  'text': 'Red Sox Beat Yankees 6-4 in 12 Innings BOSTON - Down to their last three outs of the season, the Boston Red Sox rallied - against Mariano Rivera, the New York Yankees and decades of disappointment. Bill Mueller singled home the tying run off Rivera in the ninth inning and David Ortiz homered against Paul Quantrill in the 12th, leading Boston to a 6-4 victory Sunday over the Yankees that avoided a four-game sweep in the AL championship series...',\n",
              "  'score': 0.8104304671287537},\n",
              " {'id': '63221',\n",
              "  'text': 'Red Sox Beat Yankees 6-4 in 12 Innings BOSTON - Down to their last three outs of the season, the Boston Red Sox rallied - against Mariano Rivera, the New York Yankees and decades of disappointment. Bill Mueller singled home the tying run off Rivera in the ninth inning and David Ortiz homered against Paul Quantrill in the 12th, leading Boston to a 6-4 victory over the Yankees on Sunday night that avoided a four-game sweep in the AL championship series...',\n",
              "  'score': 0.8097385168075562},\n",
              " {'id': '66861',\n",
              "  'text': 'Record-Breaking Red Sox Clinch World Series Berth  NEW YORK (Reuters) - The Boston Red Sox crushed the New  York Yankees 10-3 Wednesday to complete an historic comeback  victory over their arch-rivals by four games to three in the  American League Championship Series.',\n",
              "  'score': 0.8003846406936646}]"
            ]
          },
          "metadata": {},
          "execution_count": 24
        }
      ],
      "source": [
        "embeddings.search(\"red sox defeat yankees\")"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "As expected, we get the standard `id, text, score` fields with the top matches for the query. The difference though is that all the database metadata normally stored in a local SQLite file is now stored in a Postgres server.\n",
        "\n",
        "This opens up several possibilities such as row-level security. If a row isn't returned by the database, it won't be shown here. Alternatively, this search could optionally return only the ids and scores, which lets the user know a record exists they don't have access to.\n",
        "\n",
        "As with other supported databases, underlying database functions can be called from txtai SQL."
      ],
      "metadata": {
        "id": "SeV3i6vixs-e"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "embeddings.search(\"SELECT id, text, md5(text), score FROM txtai WHERE similar('red sox defeat yankees')\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "3PQAoPfvSyP1",
        "outputId": "1eed6cf0-6402-4760-b673-5798fc329d0e"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[{'id': '63561',\n",
              "  'text': 'Red Sox Beat Yankees 6-4 in 12 Innings BOSTON - Down to their last three outs of the season, the Boston Red Sox rallied - against Mariano Rivera, the New York Yankees and decades of disappointment. Bill Mueller singled home the tying run off Rivera in the ninth inning and David Ortiz homered against Paul Quantrill in the 12th, leading Boston to a 6-4 victory Sunday over the Yankees that avoided a four-game sweep in the AL championship series...',\n",
              "  'md5': '1e55a78fdf0cb3be3ef61df650f0a50f',\n",
              "  'score': 0.8104304671287537},\n",
              " {'id': '63221',\n",
              "  'text': 'Red Sox Beat Yankees 6-4 in 12 Innings BOSTON - Down to their last three outs of the season, the Boston Red Sox rallied - against Mariano Rivera, the New York Yankees and decades of disappointment. Bill Mueller singled home the tying run off Rivera in the ninth inning and David Ortiz homered against Paul Quantrill in the 12th, leading Boston to a 6-4 victory over the Yankees on Sunday night that avoided a four-game sweep in the AL championship series...',\n",
              "  'md5': 'a0417e1fc503a5a2945c8755b6fb18d5',\n",
              "  'score': 0.8097385168075562},\n",
              " {'id': '66861',\n",
              "  'text': 'Record-Breaking Red Sox Clinch World Series Berth  NEW YORK (Reuters) - The Boston Red Sox crushed the New  York Yankees 10-3 Wednesday to complete an historic comeback  victory over their arch-rivals by four games to three in the  American League Championship Series.',\n",
              "  'md5': '398a8508692aed109bd8c56f067a8083',\n",
              "  'score': 0.8003846406936646}]"
            ]
          },
          "metadata": {},
          "execution_count": 25
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Note the addition of the Postgres `md5` function to the query.\n",
        "\n",
        "Let's save and show the files in the embeddings database."
      ],
      "metadata": {
        "id": "lqkfoknYziiU"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "embeddings.save(\"vectors\")\n",
        "!ls -l vectors"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "gKTsjD3uS0YG",
        "outputId": "aa2ec6af-509b-468e-a7e9-11b550707073"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "total 183032\n",
            "-rw-r--r-- 1 root root       355 Sep  7 16:38 config\n",
            "-rw-r--r-- 1 root root 187420123 Sep  7 16:38 embeddings\n"
          ]
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Only the configuration and the local vectors index are stored in this case."
      ],
      "metadata": {
        "id": "ylGVrmL1ztG0"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# External indexing\n",
        "\n",
        "As mentioned previously, all of the main components of txtai can be replaced with custom components. For example, there are external integrations for storing dense vectors in [Weaviate](https://github.com/hsm207/weaviate-txtai) and [Qdrant](https://github.com/qdrant/qdrant-txtai) to name a few.\n",
        "\n",
        "Next, we'll build an example that stores metadata in Postgres and builds a sparse index with Elasticsearch."
      ],
      "metadata": {
        "id": "RUqaj-tVz19K"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Scoring component for Elasticsearch\n",
        "\n",
        "First, we need to define a custom scoring component for Elasticsearch. While could have used an existing integration, it's important to show that creating a new component isn't a large LOE (~70 lines of code). See below."
      ],
      "metadata": {
        "id": "OZLCQceF3US9"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "fbPd8uXZrekP"
      },
      "outputs": [],
      "source": [
        "from elasticsearch import Elasticsearch\n",
        "from elasticsearch.helpers import bulk\n",
        "\n",
        "from txtai.scoring import Scoring\n",
        "\n",
        "class Elastic(Scoring):\n",
        "  def __init__(self, config=None):\n",
        "    # Scoring configuration\n",
        "    self.config = config if config else {}\n",
        "\n",
        "    # Server parameters\n",
        "    self.url = self.config.get(\"url\", \"http://localhost:9200\")\n",
        "    self.indexname = self.config.get(\"indexname\", \"testindex\")\n",
        "\n",
        "    # Elasticsearch connection\n",
        "    self.connection = Elasticsearch(self.url)\n",
        "\n",
        "    self.terms = True\n",
        "    self.normalize = self.config.get(\"normalize\")\n",
        "\n",
        "  def insert(self, documents, index=None):\n",
        "    rows = []\n",
        "    for uid, document, tags in documents:\n",
        "        rows.append((index, document))\n",
        "\n",
        "        # Increment index\n",
        "        index = index + 1\n",
        "\n",
        "    bulk(self.connection, ({\"_index\": self.indexname, \"_id\": uid, \"text\": text} for uid, text in rows))\n",
        "\n",
        "  def index(self, documents=None):\n",
        "    self.connection.indices.refresh(index=self.indexname)\n",
        "\n",
        "  def search(self, query, limit=3):\n",
        "    return self.batchsearch([query], limit)\n",
        "\n",
        "  def batchsearch(self, queries, limit=3):\n",
        "    # Generate bulk queries\n",
        "    request = []\n",
        "    for query in queries:\n",
        "      req_head = {\"index\": self.indexname, \"search_type\": \"dfs_query_then_fetch\"}\n",
        "      req_body = {\n",
        "        \"_source\": False,\n",
        "        \"query\": {\"multi_match\": {\"query\": query, \"type\": \"best_fields\", \"fields\": [\"text\"], \"tie_breaker\": 0.5}},\n",
        "        \"size\": limit,\n",
        "      }\n",
        "      request.extend([req_head, req_body])\n",
        "\n",
        "      # Run ES query\n",
        "      response = self.connection.msearch(body=request, request_timeout=600)\n",
        "\n",
        "      # Read responses\n",
        "      results = []\n",
        "      for resp in response[\"responses\"]:\n",
        "        result = resp[\"hits\"][\"hits\"]\n",
        "        results.append([(r[\"_id\"], r[\"_score\"]) for r in result])\n",
        "\n",
        "      return results\n",
        "\n",
        "  def count(self):\n",
        "    response = self.connection.cat.count(self.indexname, params={\"format\": \"json\"})\n",
        "    return int(response[0][\"count\"])\n",
        "\n",
        "  def load(self, path):\n",
        "    # No local storage\n",
        "    pass\n",
        "\n",
        "  def save(self, path):\n",
        "    # No local storage\n",
        "    pass"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "## Elasticsearch server\n",
        "\n",
        "As with Postgres, we'll install and start an Elasticsearch instance."
      ],
      "metadata": {
        "id": "3Zu1en9A2fP7"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "bZFJGE9RpRTt"
      },
      "outputs": [],
      "source": [
        "%%capture\n",
        "import os\n",
        "\n",
        "# Download and extract elasticsearch\n",
        "os.system(\"wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.10.1-linux-x86_64.tar.gz\")\n",
        "os.system(\"tar -xzf elasticsearch-7.10.1-linux-x86_64.tar.gz\")\n",
        "os.system(\"chown -R daemon:daemon elasticsearch-7.10.1\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "oeCGDn5NpR9p"
      },
      "outputs": [],
      "source": [
        "from subprocess import Popen, PIPE, STDOUT\n",
        "\n",
        "# Start and wait for serverw\n",
        "server = Popen(['elasticsearch-7.10.1/bin/elasticsearch'], stdout=PIPE, stderr=STDOUT, preexec_fn=lambda: os.setuid(1))\n",
        "!sleep 30"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Let's build the index. The only difference from the previous example is setting the custom `scoring` component."
      ],
      "metadata": {
        "id": "l9WI7NJy4TXo"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "ZCQhwUVRnnEY"
      },
      "outputs": [],
      "source": [
        "import txtai\n",
        "\n",
        "# Create embeddings\n",
        "embeddings = txtai.Embeddings(\n",
        "    keyword=True,\n",
        "    content=\"postgresql+psycopg2://postgres:postgres@localhost/postgres\",\n",
        "    scoring= \"__main__.Elastic\"\n",
        ")\n",
        "\n",
        "# Index dataset\n",
        "embeddings.index(ds[\"text\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Below is the same search as shown before."
      ],
      "metadata": {
        "id": "JrfZTEyE4Q-i"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "embeddings.search(\"red sox defeat yankees\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "VFgIV_S8Stby",
        "outputId": "75f3b9ab-2805-4a3c-ca93-b843e21af439"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[{'id': '66954',\n",
              "  'text': 'Boston Red Sox make history Believe it, New England -- the Boston Red Sox are in the World Series. And they got there with the most unbelievable comeback of all, with four sweet swings after decades of defeat, shaming the dreaded New York Yankees.',\n",
              "  'score': 21.451942},\n",
              " {'id': '69577',\n",
              "  'text': 'Passing thoughts on Yankees-Red Sox series The Red Sox beat the Yankees at Yankee Stadium in a season-deciding game. The Red Sox beat the Yankees at Yankee Stadium in a season-deciding game and it wasn #39;t even close.',\n",
              "  'score': 20.923117},\n",
              " {'id': '67253',\n",
              "  'text': 'Sox Victorious At Last!! BOSTON -- After suffering decades of defeat and disappointment, the 2004 Boston Red Sox made history Wednesday night, beating the Yankees in the house that Ruth built and claiming the American League championship trophy.',\n",
              "  'score': 20.865997}]"
            ]
          },
          "metadata": {},
          "execution_count": 31
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "And once again we get the top matches. This time though the index is in Elasticsearch. Why are results and scores different? This is because this is a keyword index and it's using Elasticsearch's raw BM25 scores.\n",
        "\n",
        "One enhancement to this component would be adding score normalization as seen in the standard scoring components.\n",
        "\n",
        "For good measure, let's also show that the `md5` function can be called here too."
      ],
      "metadata": {
        "id": "PE9SAEDn4gDE"
      }
    },
    {
      "cell_type": "code",
      "source": [
        "embeddings.search(\"SELECT id, text, md5(text), score FROM txtai WHERE similar('red sox defeat yankees')\")"
      ],
      "metadata": {
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "id": "4oYAIFPqWbSq",
        "outputId": "98178870-393e-42ca-c1d6-9c91239ddabe"
      },
      "execution_count": null,
      "outputs": [
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[{'id': '66954',\n",
              "  'text': 'Boston Red Sox make history Believe it, New England -- the Boston Red Sox are in the World Series. And they got there with the most unbelievable comeback of all, with four sweet swings after decades of defeat, shaming the dreaded New York Yankees.',\n",
              "  'md5': '29084f8640d4d72e402e991bc9fdbfa0',\n",
              "  'score': 21.451942},\n",
              " {'id': '69577',\n",
              "  'text': 'Passing thoughts on Yankees-Red Sox series The Red Sox beat the Yankees at Yankee Stadium in a season-deciding game. The Red Sox beat the Yankees at Yankee Stadium in a season-deciding game and it wasn #39;t even close.',\n",
              "  'md5': '056983d301975084b49a5987185f2ddf',\n",
              "  'score': 20.923117},\n",
              " {'id': '67253',\n",
              "  'text': 'Sox Victorious At Last!! BOSTON -- After suffering decades of defeat and disappointment, the 2004 Boston Red Sox made history Wednesday night, beating the Yankees in the house that Ruth built and claiming the American League championship trophy.',\n",
              "  'md5': '7838fcf610f0b569829c9bafdf9012f2',\n",
              "  'score': 20.865997}]"
            ]
          },
          "metadata": {},
          "execution_count": 32
        }
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Same results with the additional `md5` column, as expected."
      ],
      "metadata": {
        "id": "zW1NkPpA5V0b"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Explore the data stores\n",
        "\n",
        "The last thing we'll do is see where and how this data is stored in Postgres and Elasticsearch.\n",
        "\n",
        "Let's connect to the local Postgres instance and sample content from the `sections` table."
      ],
      "metadata": {
        "id": "54R2s6CM5a-6"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pWRJ4vNkn2QL",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "fab25634-474f-441d-ef82-cf903b8c8762"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "env: DATABASE_URL=postgresql://postgres:postgres@localhost:5432/postgres\n",
            "The sql extension is already loaded. To reload it, use:\n",
            "  %reload_ext sql\n"
          ]
        }
      ],
      "source": [
        "%env DATABASE_URL=postgresql://postgres:postgres@localhost:5432/postgres\n",
        "%load_ext sql"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "axy-m3JMn9H8",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 175
        },
        "outputId": "6d7a523f-722e-4139-a4f2-db30f8a8f877"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            " * postgresql://postgres:***@localhost:5432/postgres\n",
            "3 rows affected.\n"
          ]
        },
        {
          "output_type": "execute_result",
          "data": {
            "text/plain": [
              "[('66954', 'Boston Red Sox make history Believe it, New England -- the Boston Red Sox are in the World Series. And they got there with the most unbelievable comeback of all, with four sweet swings after decades of defeat, shaming the dreaded New York Yankees.'),\n",
              " ('62732', \"BoSox, Astros Play for Crucial Game 4 Wins (AP) AP - The Boston Red Sox entered this AL championship series hoping to finally overcome their bitter r ... (50 characters truncated) ... n-game defeat last October. Instead, they've been reduced to trying to prevent the Yankees from completing a humiliating sweep in their own ballpark.\"),\n",
              " ('62752', \"BoSox, Astros Play for Crucial Game 4 Wins The Boston Red Sox entered this AL championship series hoping to finally overcome their bitter rivals from ... (42 characters truncated) ... game defeat last October. Instead, they've been reduced to trying to prevent the Yankees from completing a humiliating sweep in their own ballpark...\")]"
            ],
            "text/html": [
              "<table>\n",
              "    <thead>\n",
              "        <tr>\n",
              "            <th>id</th>\n",
              "            <th>text</th>\n",
              "        </tr>\n",
              "    </thead>\n",
              "    <tbody>\n",
              "        <tr>\n",
              "            <td>66954</td>\n",
              "            <td>Boston Red Sox make history Believe it, New England -- the Boston Red Sox are in the World Series. And they got there with the most unbelievable comeback of all, with four sweet swings after decades of defeat, shaming the dreaded New York Yankees.</td>\n",
              "        </tr>\n",
              "        <tr>\n",
              "            <td>62732</td>\n",
              "            <td>BoSox, Astros Play for Crucial Game 4 Wins (AP) AP - The Boston Red Sox entered this AL championship series hoping to finally overcome their bitter rivals from New York following a heartbreaking seven-game defeat last October. Instead, they&#x27;ve been reduced to trying to prevent the Yankees from completing a humiliating sweep in their own ballpark.</td>\n",
              "        </tr>\n",
              "        <tr>\n",
              "            <td>62752</td>\n",
              "            <td>BoSox, Astros Play for Crucial Game 4 Wins The Boston Red Sox entered this AL championship series hoping to finally overcome their bitter rivals from New York following a heartbreaking seven-game defeat last October. Instead, they&#x27;ve been reduced to trying to prevent the Yankees from completing a humiliating sweep in their own ballpark...</td>\n",
              "        </tr>\n",
              "    </tbody>\n",
              "</table>"
            ]
          },
          "metadata": {},
          "execution_count": 34
        }
      ],
      "source": [
        "%%sql\n",
        "select id, text from sections where text like '%Red Sox%' and text like '%Yankees%' and text like '%defeat%' limit 3;"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "As expected, we can see content stored directly in Postgres!\n",
        "\n",
        "Now let's check Elasticsearch."
      ],
      "metadata": {
        "id": "6JfMoZPB5uZo"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "X6A7GP7ZBm_v",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "151df22c-6d43-4599-eba6-a579fe437677"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "{\n",
            "  \"took\": 13,\n",
            "  \"timed_out\": false,\n",
            "  \"_shards\": {\n",
            "    \"total\": 1,\n",
            "    \"successful\": 1,\n",
            "    \"skipped\": 0,\n",
            "    \"failed\": 0\n",
            "  },\n",
            "  \"hits\": {\n",
            "    \"total\": {\n",
            "      \"value\": 3297,\n",
            "      \"relation\": \"eq\"\n",
            "    },\n",
            "    \"max_score\": 21.451942,\n",
            "    \"hits\": [\n",
            "      {\n",
            "        \"_index\": \"testindex\",\n",
            "        \"_type\": \"_doc\",\n",
            "        \"_id\": \"66954\",\n",
            "        \"_score\": 21.451942,\n",
            "        \"_source\": {\n",
            "          \"text\": \"Boston Red Sox make history Believe it, New England -- the Boston Red Sox are in the World Series. And they got there with the most unbelievable comeback of all, with four sweet swings after decades of defeat, shaming the dreaded New York Yankees.\"\n",
            "        }\n",
            "      },\n",
            "      {\n",
            "        \"_index\": \"testindex\",\n",
            "        \"_type\": \"_doc\",\n",
            "        \"_id\": \"69577\",\n",
            "        \"_score\": 20.923117,\n",
            "        \"_source\": {\n",
            "          \"text\": \"Passing thoughts on Yankees-Red Sox series The Red Sox beat the Yankees at Yankee Stadium in a season-deciding game. The Red Sox beat the Yankees at Yankee Stadium in a season-deciding game and it wasn #39;t even close.\"\n",
            "        }\n",
            "      },\n",
            "      {\n",
            "        \"_index\": \"testindex\",\n",
            "        \"_type\": \"_doc\",\n",
            "        \"_id\": \"67253\",\n",
            "        \"_score\": 20.865997,\n",
            "        \"_source\": {\n",
            "          \"text\": \"Sox Victorious At Last!! BOSTON -- After suffering decades of defeat and disappointment, the 2004 Boston Red Sox made history Wednesday night, beating the Yankees in the house that Ruth built and claiming the American League championship trophy.\"\n",
            "        }\n",
            "      }\n",
            "    ]\n",
            "  }\n",
            "}\n"
          ]
        }
      ],
      "source": [
        "import json\n",
        "import requests\n",
        "\n",
        "response = requests.get(\"http://localhost:9200/_search?q=red+sox+defeat+yankees&size=3\")\n",
        "print(json.dumps(response.json(), indent=2))"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "Same query results as what was run through the embeddings database.\n",
        "\n",
        "Let's save the embeddings database and review what's stored."
      ],
      "metadata": {
        "id": "6sS0ym5Y59w9"
      }
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "BsFNRvRzoj9T",
        "colab": {
          "base_uri": "https://localhost:8080/"
        },
        "outputId": "38bef15f-2ce3-4c8a-9bd0-3d7d0c84d794"
      },
      "outputs": [
        {
          "output_type": "stream",
          "name": "stdout",
          "text": [
            "total 4\n",
            "-rw-r--r-- 1 root root 155 Sep  7 16:39 config\n"
          ]
        }
      ],
      "source": [
        "embeddings.save(\"elastic\")\n",
        "!ls -l elastic"
      ]
    },
    {
      "cell_type": "markdown",
      "source": [
        "And all we have is the configuration. No `database`, `embeddings` or `scoring` files. That data is in Postgres and Elasticsearch!"
      ],
      "metadata": {
        "id": "OAtoEmV36FWp"
      }
    },
    {
      "cell_type": "markdown",
      "source": [
        "# Wrapping up\n",
        "\n",
        "This notebook showed how external databases and other external integrations can be used with embeddings databases. This architecture ensures that as new ways to index and store data become available, txtai can easily adapt.\n",
        "\n",
        "This notebook also showed how creating a custom component is a low level of effort and can easily be done for a component without an existing integration.\n"
      ],
      "metadata": {
        "id": "SQxnieiT6RTe"
      }
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
